减少甲烷排放对于缓解全球变暖至关重要。为了将甲烷排放归因于其来源,有必要综合的甲烷源基础设施数据集。深入学习远程感知的图像的最新进展有可能识别甲烷源的位置和特征,但是缺乏公开可用的数据,可以使机器学习研究人员和从业人员能够构建自动映射方法。为了帮助填补这一空白,我们在美国构建了一个称为Meter-ML的多传感器数据集,该数据集包含86,625个地理参考的NAIP,Sentinel-1和Sentinel-2图像,并在美国标记为有甲烷源设施,包括甲烷源设施,包括集中动物喂养操作,,,,,,,包括浓缩动物喂养操作,煤矿,垃圾填埋场,天然气加工厂,炼油厂和石油末端以及废水处理厂。我们尝试各种模型,以利用不同的空间分辨率,空间足迹,图像产品和光谱带。我们发现,我们的最佳模型在确定浓缩动物喂养操作的精确召回曲线下达到了一个面积,在专家标签的测试集上,用于识别浓缩动物饲养操作,用于油炼油厂和石油末端0.821,这表明有可能进行大规模映射。我们在https://stanfordmlgroup.github.io/projects/meter-ml/上免费提供仪表-ML,以支持自动化甲烷源映射的未来工作。
translated by 谷歌翻译
In the Earth's magnetosphere, there are fewer than a dozen dedicated probes beyond low-Earth orbit making in-situ observations at any given time. As a result, we poorly understand its global structure and evolution, the mechanisms of its main activity processes, magnetic storms, and substorms. New Artificial Intelligence (AI) methods, including machine learning, data mining, and data assimilation, as well as new AI-enabled missions will need to be developed to meet this Sparse Data challenge.
translated by 谷歌翻译
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
The choice of geometric space for knowledge graph (KG) embeddings can have significant effects on the performance of KG completion tasks. The hyperbolic geometry has been shown to capture the hierarchical patterns due to its tree-like metrics, which addressed the limitations of the Euclidean embedding models. Recent explorations of the complex hyperbolic geometry further improved the hyperbolic embeddings for capturing a variety of hierarchical structures. However, the performance of the hyperbolic KG embedding models for non-transitive relations is still unpromising, while the complex hyperbolic embeddings do not deal with multi-relations. This paper aims to utilize the representation capacity of the complex hyperbolic geometry in multi-relational KG embeddings. To apply the geometric transformations which account for different relations and the attention mechanism in the complex hyperbolic space, we propose to use the fast Fourier transform (FFT) as the conversion between the real and complex hyperbolic space. Constructing the attention-based transformations in the complex space is very challenging, while the proposed Fourier transform-based complex hyperbolic approaches provide a simple and effective solution. Experimental results show that our methods outperform the baselines, including the Euclidean and the real hyperbolic embedding models.
translated by 谷歌翻译
与置换不变的代理框架的合作多元化学习(MARL)在现实世界应用中取得了巨大的经验成功。不幸的是,由于许多代理商的诅咒以及对现有作品中的关系推理的有限探索,对这个MARL问题的理论理解缺乏。在本文中,我们验证了变压器是否实现了复杂的关系推理,并提出和分析了与变压器近似器的无模型和基于模型的离线MARL算法。我们证明,基于模型和基于模型的算法的次级次数差距分别与代理数量分别独立于和对数,这减轻了许多试剂的诅咒。这些结果是变压器的新概括误差结合的结果以及对变压器系统动力学的最大似然估计(MLE)的新分析。我们的基于模型的算法是第一个明确利用代理的置换不变性的可证明有效的MARL算法。
translated by 谷歌翻译
自动扬声器验证(ASV)已在现实生活中广泛用于身份认证。但是,随着语音转换的快速发展,语音合成算法和记录设备质量的提高,ASV系统很容易受到欺骗攻击。近年来,有关合成和重播语音检测的许多作品,研究人员提出了许多基于手工制作的特征的反欺骗方法,以提高合成和重播语音检测系统的准确性和鲁棒性。但是,使用手工制作的功能而不是原始波形将丢失某些信息进行抗旋转,这将降低系统的检测性能。受图像分类任务中Convnext的有希望的性能的启发,我们将Convnext网络体系结构相应地扩展到SPOOF攻击任务,并提出了端到端的反欺骗模型。通过将扩展体系结构与频道注意块相结合,提出的模型可以专注于最有用的语音表示子频段,以改善反欺骗性的性能。实验表明,对于ASVSPOOF 2019 LA评估数据集和PA评估数据集,我们提出的最佳单个系统可以达到1.88%和2.79%的误差率,这证明了该模型的抗SpoFofing能力。
translated by 谷歌翻译
ICECUBE是一种用于检测1 GEV和1 PEV之间大气和天体中微子的光学传感器的立方公斤阵列,该阵列已部署1.45 km至2.45 km的南极的冰盖表面以下1.45 km至2.45 km。来自ICE探测器的事件的分类和重建在ICeCube数据分析中起着核心作用。重建和分类事件是一个挑战,这是由于探测器的几何形状,不均匀的散射和冰中光的吸收,并且低于100 GEV的光,每个事件产生的信号光子数量相对较少。为了应对这一挑战,可以将ICECUBE事件表示为点云图形,并将图形神经网络(GNN)作为分类和重建方法。 GNN能够将中微子事件与宇宙射线背景区分开,对不同的中微子事件类型进行分类,并重建沉积的能量,方向和相互作用顶点。基于仿真,我们提供了1-100 GEV能量范围的比较与当前ICECUBE分析中使用的当前最新最大似然技术,包括已知系统不确定性的影响。对于中微子事件分类,与当前的IceCube方法相比,GNN以固定的假阳性速率(FPR)提高了信号效率的18%。另外,GNN在固定信号效率下将FPR的降低超过8(低于半百分比)。对于能源,方向和相互作用顶点的重建,与当前最大似然技术相比,分辨率平均提高了13%-20%。当在GPU上运行时,GNN能够以几乎是2.7 kHz的中位数ICECUBE触发速率的速率处理ICECUBE事件,这打开了在在线搜索瞬态事件中使用低能量中微子的可能性。
translated by 谷歌翻译
由于生成对抗网络(GAN)的突破,3D可控制的肖像合成已大大提高。但是,用精确的3D控制操纵现有的面部图像仍然具有挑战性。虽然连接gan倒置和3D感知,但噪声到图像是一种直接的解决方案,但它效率低下,可能导致编辑质量明显下降。为了填补这一空白,我们提出了3D-FM GAN,这是一个专门为3D可控制的面部操作设计的新型有条件GAN框架,并且在端到端学习阶段后不需要任何调整。通过小心地编码输入面图像和3D编辑的基于物理的渲染,我们的图像生成器提供了高质量,具有身份的3D控制面部操纵。为了有效地学习这种新颖的框架,我们制定了两种基本的训练策略和一种新颖的乘法共同调制体系结构,可在天真的方案上显着改善。通过广泛的评估,我们表明我们的方法在各种任务上的表现优于先前的艺术,具有更好的编辑性,更强的身份保存和更高的照片真实性。此外,我们在大型姿势编辑和室外图像上展示了设计更好的概括性。
translated by 谷歌翻译
鉴于HEP研究的核心,数据科学(DS)和机器学习(ML)在高能量物理学(HEP)中的作用增长良好和相关。此外,利用物理数据固有的对称性激发了物理信息的ML作为计算机科学研究的充满活力的子场。 HEP研究人员从广泛使用的材料中受益匪浅,可用于教育,培训和劳动力开发。他们还为这些材料做出了贡献,并为DS/ML相关的字段提供软件。物理部门越来越多地在DS,ML和物理学的交集上提供课程,通常使用HEP研究人员开发的课程,并涉及HEP中使用的开放软件和数据。在这份白皮书中,我们探讨了HEP研究与DS/ML教育之间的协同作用,讨论了此交叉路口的机会和挑战,并提出了将是互惠互利的社区活动。
translated by 谷歌翻译